A Framework for Classifying Unstructured Data of Cardiac Patients: A Supervised Learning Approach

نویسندگان

  • Iqra Basharat
  • Ali Raza Anjum
  • Usman Qamar
  • Shoab Ahmed Khan
چکیده

Data mining has recently emerged as an important field that helps in extracting useful knowledge from the huge amount of unstructured and apparently un-useful data. Data mining in health organization has highest potential in this area for mining the unknown patterns in the datasets and disease prediction. The amount of work done for cardiovascular patients in Pakistan is scarcely very less. In this research study, using classification approach of machine learning, we have proposed a framework to classify unstructured data of cardiac patients of the Armed Forces Institute of Cardiology (AFIC), Pakistan to four important classes. The focus of this study is to structure the unstructured medical data/reports manually, as there was no structured database available for the specific data under study. Multi-nominal Logistic Regression (LR) is used to perform multiclass classification and 10-fold cross validation is used to validate the classification models, in order to analyze the results and the performance of Logistic Regression models. The performancemeasuring criterion that is used includes precision, f-measure, sensitivity, specificity, classification error, area under the curve and accuracy. This study will provide a road map for future research in the field of Bioinformatics in Pakistan. Keywords—bioinformatics; classification techniques; heart disease in Pakistan; heart disease prediction; multinomial classification; logistic regression

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Self-supervised Learning Framework for Classifying Microarray Gene Expression Data

It is important to develop computational methods that can effectively resolve two intrinsic problems in microarray data: high dimensionality and small sample size. In this paper, we propose a self-supervised learning framework for classifying microarray gene expression data using Kernel Discriminant-EM (KDEM) algorithm. This framework applies self-supervised learning techniques in an optimal no...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016